test(framework): add dual DB engine coverage with flaky test fixes#6586
test(framework): add dual DB engine coverage with flaky test fixes#6586vividctrlalt wants to merge 6 commits intotronprotocol:developfrom
Conversation
- Extract shared test logic into DbDataSourceImplTest, BaseMethodTest, TestConstants - Enable DB engine override via system property so tests run against both engines - Refactor 20+ test classes to support dual-engine testing - Fix flaky tests: FreezeTest, FreezeV2Test, VoteTest, ManagerTest, ShieldedReceiveTest - Update build.gradle and CI workflow accordingly
| private final Map<String, Sha256Hash> dbRoots = Maps.newConcurrentMap(); | ||
|
|
||
| public static String getDbEngineFromConfig(final Config config) { | ||
| if (Arch.isArm64()) { |
There was a problem hiding this comment.
This change and validateConfig introduce a regression: on ARM64, starting the node with the default config.conf (where db.engine is set to LEVELDB) now results in a startup failure.
The issue can be reproduced as follows:
$ ./gradlew :framework:buildFullNodeJar
…..
Building for architecture: aarch64, Java version: 17
…..
$ java -jar framework/build/libs/FullNode.jar
$ tail -f logs/tron.log
…..
15:17:21.876 ERROR [main] [Exit](ExitManager.java:49) Shutting down with code: PARAMETER_INIT(1), reason: ARM64 architecture only supports RocksDB. Current engine: LEVELDBThere was a problem hiding this comment.
Agreed. I've opened an issue to continue the discussion: #6587
There was a problem hiding this comment.
I believe fail-fast is the correct behavior here. If a user runs on ARM64 with db.engine=LEVELDB, the node should refuse to start with a clear error message, rather than silently falling back to RocksDB or deferring the failure to runtime.
The principle is: no hidden behavior. If you're running on ARM64, you must explicitly configure a supported engine. A silent fallback would mask a misconfiguration and could lead to confusion later (e.g., user expects LevelDB but data is actually stored in RocksDB).
That said, I understand the concern about the default config.conf shipping with LEVELDB. This is really a config-level issue — the default config should either be platform-aware or the documentation should clearly state that ARM64 users need to set db.engine=ROCKSDB. Happy to discuss further in #6587.
There was a problem hiding this comment.
To add more context: the whole point of this series of changes is to eliminate silent behaviors in FullNode. Silently swapping the DB engine in production is exactly the kind of hidden behavior we're trying to remove — it should be strictly prohibited.
If an ARM64 node has db.engine=LEVELDB in config.conf, that's a configuration management problem, not something the application should paper over at runtime. Mixing config-management concerns into production code leads to harder-to-debug issues and undermines the principle of explicit configuration.
The correct fix belongs in the deployment/config layer — ARM64 environments should ship with the right config.conf from the start.
| * Validate final configuration after all sources (defaults, config, CLI) are applied. | ||
| */ | ||
| public static void validateConfig() { | ||
| if (Arch.isArm64() && !"ROCKSDB".equals(PARAMETER.storage.getDbEngine())) { |
There was a problem hiding this comment.
Change "ROCKSDB".equals(PARAMETER.storage.getDbEngine()) to "ROCKSDB".equals(PARAMETER.storage.getDbEngine()).toUpperCase() or Constant.ROCKSDB.equalsIgnoreCase(PARAMETER.storage.getDbEngine()), so that values like rocksdb or RocksDB are also supported, just like https://github.com/vividctrlalt/java-tron/blob/test/dual-db-engine-coverage/framework/src/main/java/org/tron/core/config/args/Args.java#L799.
There was a problem hiding this comment.
Good catch! Will fix to use Constant.ROCKSDB.equalsIgnoreCase() for consistency.
| * Proper fix: TrieImpl.insert() should short-circuit when the existing value | ||
| * equals the new value, avoiding unnecessary invalidate(). See TrieImpl:188-192. | ||
| */ | ||
| @Ignore("TrieImpl bug: root hash depends on insertion order with duplicate key-value puts") |
There was a problem hiding this comment.
Why was @Ignore used instead of addressing the issue in TrieImpl.insert()?
There was a problem hiding this comment.
Good question! This is not a test bug — it's a bug in the production code TrieImpl.insert(). When inserting a duplicate key with the same value, the root hash changes depending on insertion order, which is incorrect.
The comment on @Ignore explains the root cause: TrieImpl.insert() should short-circuit when the existing value equals the new value, avoiding an unnecessary invalidate() call (see TrieImpl:188-192).
Since the scope of this PR is test refactoring (dual DB engine coverage), fixing production code in TrieImpl is out of scope here. The @Ignore annotation documents the known issue so it can be tracked and fixed separately. Also worth noting that this trie code is not yet enabled in production, so the impact is limited for now.
|
Running |
Add assumeLevelDbAvailable() to DBIteratorTest.testLevelDb to skip when LevelDB JNI is unavailable on ARM64 platforms. Co-Authored-By: 3for <zouyudi@gmail.com>
On ARM64, LevelDB is not available at build time. Instead of throwing an exception that breaks default config startup, warn and override to RocksDB. Also uses equalsIgnoreCase for consistency.
The previous check used client-side Socket.connect(), which cannot detect ports in TCP TIME_WAIT state. This caused flaky test failures when gRPC server bind failed on seemingly available ports.
| ExitManager.initExceptionHandler(); | ||
| checkJdkVersion(); | ||
| Args.setParam(args, "config.conf"); | ||
| Args.validateConfig(); |
There was a problem hiding this comment.
-
ARM64 normalization happens too late
On ARM64,dbEngine = LEVELDBis corrected toROCKSDBinvalidateConfig()aftersetParam().
However,applyConfigParamsalready runs before that and checks the engine for RocksDB initialization. Since the engine is stillLEVELDBat that point, RocksDB backup/init logic is skipped. -
Should run after CLI override
applyConfigParamsruns beforeapplyCLIParams, so.confcan be overridden by--storage-db-engine.
The ARM64 normalization should be applied after both config and CLI are resolved. -
Method name is misleading
validateConfigmutates config instead of just validating.
Consider renaming it tonormalizeConfigorfinalizeConfig.
public static void main(String[] args) {
.....
Args.setParam(args, "config.conf");
Args.validateConfig(); // normalizes dbEngine to ROCKSDB on ARM64
......
}
public static void setParam(final String[] args, final String confFileName) {
.....
// 2. Config overrides defaults
applyConfigParams(config);
// 3. CLI overrides Config (highest priority)
applyCLIParams(cmd, jc);
// 4. Init witness (depends on CLI witness flag)
initLocalWitnesses(config, cmd);
}
public static void applyConfigParams(
final Config config) {
......
initBackupProperty(config);
if (Constant.ROCKSDB.equalsIgnoreCase(CommonParameter
.getInstance().getStorage().getDbEngine())) {
// dbEngine is still "LEVELDB" here; RocksDB path is never reached
initRocksDbBackupProperty(config);
initRocksDbSettings(config);
}
}
There was a problem hiding this comment.
Good catch on all three points! Fixed in abb52f6:
- Timing — moved the ARM64 engine override into
setParam(), afterapplyCLIParams()and beforeinitLocalWitnesses(). NowapplyConfigParams()still reads the original config value, but the engine is corrected before any downstream code relies on it. - CLI override — since
applyPlatformConstraints()runs afterapplyCLIParams(), both config and CLI sources are resolved before the ARM64 check. - Naming — renamed
validateConfig()→applyPlatformConstraints()to accurately reflect that it mutates config rather than just validating. Also changed toprivatesince it's now called internally withinsetParam().
| if (appT != null) { | ||
| appT.shutdown(); | ||
| } | ||
| if (context != null) { | ||
| context.close(); | ||
| } |
There was a problem hiding this comment.
When appT != null, there is a potential double shutdown issue: the same Spring singleton bean may be shut down twice.
Specifically, appT.shutdown() is called explicitly, and then context.close() triggers another shutdown via TronApplicationContext, where appT.shutdown() is invoked again.
public final void destroyContext() {
beforeDestroy();
if (appT != null) {
appT.shutdown(); // first shutdown
}
if (context != null) {
context.close(); // triggers second shutdown
}
Args.clearParam();
}There was a problem hiding this comment.
Good catch! Fixed in abb52f6 — removed the explicit appT.shutdown() call. context.close() already triggers appT.shutdown() via TronApplicationContext.destroyBean(), so the explicit call was redundant.
…uble shutdown - Rename validateConfig to applyPlatformConstraints, move into setParam() after applyCLIParams so RocksDB init is not skipped - Remove external validateConfig call from FullNode.main - Fix BaseMethodTest double shutdown by letting context.close() handle appT.shutdown() via TronApplicationContext
| // 2. Config overrides defaults | ||
| applyConfigParams(config); | ||
|
|
||
| // 3. CLI overrides Config (highest priority) | ||
| applyCLIParams(cmd, jc); | ||
|
|
||
| // 4. Init witness (depends on CLI witness flag) | ||
| // 4. Apply platform constraints (e.g. ARM64 forces RocksDB) | ||
| applyPlatformConstraints(); | ||
|
|
||
| // 5. Init witness (depends on CLI witness flag) | ||
| initLocalWitnesses(config, cmd); |
There was a problem hiding this comment.
In Step 2 (applyConfigParams), there is logic that depends on dbEngine. However, in Step 4 (applyPlatformConstraints), dbEngine may still be modified.
This creates an ordering issue. It would be better to move the backup initialization logic that depends on dbEngine out of applyConfigParams and execute it after Step 4, so it operates on the final effective engine.
Alternatively, if there is a better approach, it would be worth considering.
That said, this change would impact existing tests related to applyConfigParams, since it effectively moves both:
- backup-related initialization, and
logConfig()(final config logging)
out of applyConfigParams.
Suggested adjustment:
// Insert into `setParam` AFTER Step 4
// Move out from `applyConfigParams`
initBackupProperty(config);
if (Constant.ROCKSDB.equalsIgnoreCase(
CommonParameter.getInstance().getStorage().getDbEngine())) {
initRocksDbBackupProperty(config);
initRocksDbSettings(config);
}
// Move out from `applyConfigParams`
logConfig(); // log final effective configuration
// Insert into `setParam` BEFORE Step 5It would also be helpful to add tests to verify that, on ARM64, even if LEVELDB is configured:
- the effective
dbEngineis ultimately normalized toROCKSDB, and initRocksDbBackupPropertyandinitRocksDbSettingsare both correctly executed.
There was a problem hiding this comment.
Good point. The real issue is that applyConfigParams shouldn't conditionally skip reading config values based on the current engine. Config parsing should be unconditional — if it's in the config file, read it. The engine selection only matters at runtime when the values are actually used, not at config parsing time.
Fixed in d70c1c4 — removed the if (ROCKSDB) guard around initRocksDbBackupProperty and initRocksDbSettings. These methods only store config values in memory with no side effects, so reading them unconditionally is safe and eliminates the timing dependency entirely.
There was a problem hiding this comment.
“Unconditionally reading and parsing configuration values into memory” is a great approach—simpler and more elegant than the modification I suggested.
|
Based on the latest discussion in [Discussion] Should aarch64 throw an exception when db.engine=LEVELDB? #6587, we still need a more robust and user-friendly way to handle mismatches between the existing data directory and the final effective |
Remove the db engine check around initRocksDbBackupProperty and initRocksDbSettings in applyConfigParams. These methods only store config values in memory without side effects, so reading them unconditionally eliminates the timing issue where ARM64 engine override happens after config parsing.
|
Thanks for the detailed scenario matrix! The "Bad" cases (ARM64 + existing LevelDB data dir) are expected — this is a data migration scenario, not something the engine override should silently handle. Users in this situation need to run Improving the error message in |
Agreed — this issue is out of scope for this PR. |
Summary
New test infrastructure
DbDataSourceImplTest,BaseMethodTest,TestConstantsBaseTestto support DB engine override via Typesafe Config system propertytestWithRocksDbGradle task; CI runs it on x86_64 after default buildFlaky test fixes (20+ classes)
Production code (minimal)
Args.validateConfig()for ARM64 + LevelDB rejection at startupvalidateConfig()inFullNode.main()(production entry only, tests unaffected)ROCKS_DB_ENGINEconstant inStorage.javaTest plan
./gradlew :framework:testWithRocksDbpasses on x86_64